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March 2002 Journal on Educational Resources in Computing (JERIC), volume 2 issue l 
Publisher: ACM Press 

Full text available:^ pdf(493.35 KB) Additional Information: full citation , abstract , references , index terms 

Modern processors increase their performance with complex microarchitectural mechanisms, which makes them 
more and more difficult to understand and evaluate. KScalar is a graphical simulation tool that facilitates the stuc 
of such processors. It allows students to analyze the performance behavior of a wide range of processor 
microarchitectures: from a very simple in-order, scalar pipeline, to a detailed out-of-order, superscalar pipeline 
with non-blocking caches, speculative execution, and comp ... 
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The programming language Euclid was intended for writing system programs that could be verifiable by state-of- 
the-art verification methods. Since verification was not an explicit goal in the design of Pascal, it is not surprising 
that this gave rise to differences between the two languages. The Euclid designers intended to change Pascal onh 
where it fell short of this goal. This paper examines differences in the two languages in the light of this objective. 
These differences are roughly grouped ... 
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This updated course on simulating natural phenomena will cover the latest research and production techniques fc 
simulating most of the elements of nature. The presenters will provide movie production, interactive simulation, 
and research perspectives on the difficult task of photorealistic modeling, rendering, and animation of natural 
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techniques and the latest physics-based simulation techni ... 
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Several ILP limit studies indicate the presence of considerable ILP across dynamically far-apart instructions in 
program execution. This paper proposes a hardware mechanism, dynamic vectorization (DV), as a tool for quickh 
building up a large logical instruction window. Dynamic vectorization converts repetitive dynamic instruction 
sequences into vector form, enabling the processing of instructions from beyond the corresponding program loop 
to be overlapped with the loop. This enables vec ... 
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We introduce a dynamic scheme that captures the accesspat-terns of linked data structures and can be used to 
predict future accesses with high accuracy. Our technique exploits the dependence relationships that exist 
between loads that produce addresses and loads that consume these addresses. By identzj+ing producer- 
consumer pairs, we construct a compact internal representation for the associated structure and its traversal. To 
achieve a prefetching eflect, a small prefetch engine speculatively t ... 
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Dynamic superscalar processors execute multiple instructions out-of-order by looking for independent operations 
within a large window. The number of physical registers within the processor has a direct impact on the size of 
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this window as most in-flight instructions require a new physical register at dispatch. A large multi-ported registe 
file helps improve the instruction-level parallelism (ILP), but may have a detrimental effect on clock speed, 
especially in future wire-limited technologies. ... 

9 The family of concurrent logic programming lan g uages 
Ehud Shapiro 

September 1989 ACM Computing Surveys (CSUR), volume 21 issue 3 
Publisher: ACM Press 

Full text available: ^ pdf(9.62 MB) Additional Information: full citation , abstract , references , citings , index terms 

Concurrent logic languages are high-level programming languages for parallel and distributed systems that offer 
wide range of both known and novel concurrent programming techniques. Being logic programming languages, 
they preserve many advantages of the abstract logic programming model, including the logical reading of 
programs and computations, the convenience of representing data structures with logical terms and manipulating 
them using unification, and the amenability to metaprogrammin ... 
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Full text available: ^ pdf(147. 55 KB) Additional Information: full citation , abstract , references , citings , index terms 

Communication in cache-coherent distributed shared memory (DSM) often requires invalidating (or writing back) 
cached copies of a memory block, incurring high overheads. This paper proposes Last-Touch Predictors (LTPs) th. 
learn and predict the "last touch" to a memory block by one processor before the block is accessed and 
subsequently invalidated by another. By predicting a last-touch and (self-)invalidating the block in advance, an 
LTP hides the inval ... 

11 Real-time shading 
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August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes GRAPH '04 
Publisher: ACM Press 

Full text available: ^pdf(7.39 MB) Additional Information: full citation , abstract 

Real-time procedural shading was once seen as a distant dream. When the first version of this course was offeree 
four years ago, real-time shading was possible, but only with one-of-a-kind hardware or by combining the effects 
of tens to hundreds of rendering passes. Today, almost every new computer comes with graphics hardware 
capable of interactively executing shaders of thousands to tens of thousands of instructions. This course has beer 
redesigned to address today's real-time shading capabili ... 
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^ Renju Thomas, Manoj Franklin, Chris Wilkerson, Jared Stark 

May 2003 ACM SIGARCH Computer Architecture News , Proceedings of the 30th annual international 

symposium on Computer architecture ISCA '03, volume 31 issue 2 
Publisher: ACM Press 

Full text available: ^pdf(169.91 KB) Additional Information: full citation , abstract , references , citings 

Deep pipelines and fast clock rates are necessitating the development of high accuracy, multi-stage branch 
predictors for future processors. Such a predictor uses a collection of predictors, each of which provides its 
predictions at a different stage of the pipeline front-end. A simple 1-cycle latency line predictor provides 
predictions in the first stage, followed in a couple of stages later by predictions from a more accurate global 
predictor. Finally, one or two stages later, a highly accurat ... 
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Publisher: IEEE Computer Society, ACM Press 
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Publisher Site 

The trace cache is a recently proposed solution to achieving high instruction fetch bandwidth by buffering and 
reusing dynamic instruction traces. This work presents a new block-based trace cache implementation that can 
achieve higher IPC performance with more efficient storage of traces. Instead of explicitly storing instructions of . 
trace, pointers to blocks constituting a trace are stored in a much smaller trace table. The block-based trace cad- 
renames fetch addresses at the basic block le ... 



14 T ype-Safe linking with recursive DLLs and shared libraries 
Dominic Duggan 

November 2002 ACM Transactions on Programming Languages and Systems (TOPLAS), volume 24 issue 6 
Publisher: ACM Press 

Full text available: ^ pdf (658.62 KB) Additional Information: full citation , abstract , references , citings , index terms 

Component-based programming is an increasingly prevalent theme in software development, motivating the nee 
for expressive and safe module interconnection languages. Dynamic linking is an important requirement for 
module interconnection languages, as exemplified by dynamic link libraries (DLLs) and class loaders in operating 
systems and Java, respectively. A semantics is given for a type-safe module interconnection language that 
supports shared libraries and dynamic linking, as well as circular ... 
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October 2003 ACM Transactions on Computational Logic (TOCL), volume 4 issue 4 
Publisher: ACM Press 

Full text available: |5) pdf(610.28 KB) Additional Information: full citation , abstract , references , index terms 

We give an axiomatic description of parallel, synchronous algorithms. Our main result is that every such algorithr 
can be simulated, step for step, by an abstract state machine with a background that provides for multisets. 

Keywords: ASM thesis, Parallel algorithm, abstract state machine, postulates for parallel computation 
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George Z. Chrysos, Joel S. Emer 

April 1998 ACM SIGARCH Computer Architecture News , Proceedings of the 25th annual international 

symposium on Computer architecture ISCA v 98, volume 26 issue 3 
Publisher: IEEE Computer Society, ACM Press 

Full text available:— , ttA M im lfl„ ... . 

Tgp pdf(1.66 MB) _qgE Publisher Additional Information: full citation , abstract , references , citings , index terms 
Site 



http://portal.acm.org/resultsxfo?coll=ACM&dl=ACM&CFro=7 



5/28/06 



Results (page 1): aliasing and read and write and displacement and rename and identifier and encodng and ... Page 5 of 6 



For maximum performance, an out-of-order processor must issue load instructions as early as possible, while 
avoiding memory-order violations with prior store instructions that write to the same memory location. One 
approach is to use memory dependence prediction to identify the stores upon which a load depends, and 
communicate that information to the instruction scheduler. We designate the set of stores upon which each load 
has depended as the load's "store set". The processor can discover and u ... 



18 "Flea-flicker" Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense 
Ronald D. Barnes, Shane Ryoo, Wen-mei W. Hwu 

November 2005 Proceedings of the 38th annual IEEE/ ACM International Symposium on Microarchitecture 
MICRO 38 

Publisher: IEEE Computer Society 

Full text available:^ jr/0vi0 Q ~ (H 

Tg pdf(346.82 KB) ffP 1 Additional Information: full citation , abstract 

Publisher Site 

As microprocessor designs become increasingly powerand complexity-conscious, future microarchitectures must 
decrease their reliance on expensive dynamic scheduling structures. While compilers have generally proven adep 
at planning useful static instruction-level parallelism, relying solely on the compilers instruction execution 
arrangement performs poorly when cache misses occur, because variable latency is not well tolerated. This papei 
proposes a new microarchitectural model, multipass pipel ... 



19 Link-time binary rewritin g techniques for program compaction 
Bjorn De Sutter, Bruno De Bus, Koen De Bosschere 

September 2005 ACM Transactions on Programming Languages and Systems (TOPLAS), volume 27 issue 5 
Publisher: ACM Press 

Full text available: ^ pdf(1.37 MB) Additional Information: full citation , abstract , references , index terms 

Small program size is an important requirement for embedded systems with limited amounts of memory. We 
describe how link-time compaction through binary rewriting can achieve code size reductions of up to 62&percenl 
for statically bound languages such as C, C&plus;&plus;, and Fortran, without compromising on performance. We 
demonstrate how the limited amount of information about a program at link time can be exploited to overcome 
overhead resulting from separate compilation. This is done with sc ... 

Keywords: Program representation, binary rewriting, code abstraction, compaction, interprocedural analysis, 
linker, whole-program optimization 
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Full text available: ^ pdf(502.42 KB) Additional Information: full citation , abstract , references , citings , index terms , review 

We present practical approximation methods for computing and representing interprocedural aliases for a prograi 
written in a language that includes pointers, reference parameters, and recursion. We present the following 
contributions: (1) a framework for interprocedural pointer alias analysis that handles function pointers by 
constructing the program call graph while alias analysis is being performed; (2) a flow-sensitive interprocedural 
pointer alias analysis algorithm; (3 ... 
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